Method for statistically reconstructing images from a plurality of transmission measurements having energy diversity and image reconstructor apparatus utilizing the method

ABSTRACT

A method for statistically reconstructing images from a plurality of transmission measurements having energy diversity and image reconstructor apparatus utilizing the method are provided. A statistical (maximum-likelihood) method for dual-energy X-ray CT accommodates a wide variety of potential system configurations and measurement noise models. Regularized methods (such as penalized-likelihood or Bayesian estimations) are straightforward extensions. One version of the algorithm monotonically decreases the negative log-likelihood cost function each iteration. An ordered-subsets variation of the algorithm provides a fast and practical version. The method and apparatus provide material characterization and quantitatively accurate CT values in a variety of applications. The method and apparatus provide improved noise/dose properties.

CROSS-REFERENCE TO RELATED APPLICATION

This application claims the benefit of U.S. provisional application Serial No. 60/358,233, filed Feb. 20, 2002.

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT

This invention was made with government support under NIH Grant Nos. CA 60711 and CA 65637. The Government has certain rights in the invention.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates to methods for statistically reconstructing images from a plurality of transmission measurements such as scans having energy diversity and image reconstructor apparatus utilizing the method. The invention can accommodate a wide variety of system configurations and measurement noise models including X-ray CT scanners and systems that use gamma sources with multiple energies, such as some SPECT transmission scans.

2. Background Art

Tomographic images of the spatial distribution of attenuation coefficients in the human body are valuable for medical diagnosis. Most hospitals have CT scanners for producing such images. Attenuation images are also useful in a variety of scientific studies, in industry for non-destructive evaluation, and for security purposes like baggage inspection. X-ray CT scanners are also being integrated into SPECT and PET scanners to provide accurate attenuation correction for emission image reconstruction and for precise anatomical localization of the functional features seen in the emission images.

Material attenuation coefficients depend on the energy of the incident photons. In clinical X-ray CT imaging, the source of the X-ray photons, bremsstrahlung radiation, has an inherently broad energy spectrum. Each photon energy is attenuated differently by the object (body). When such transmission measurements are processed by conventional image reconstruction methods, this energy-dependent effect causes beam-hardening artifacts and compromises quantitative accuracy. To avoid these difficulties, one could employ a radioisotope source with a monoenergetic spectrum, but the practical intensity is usually much lower leading to lower SNR. Recently developed fluorescence-based X-ray sources have somewhat improved intensity but still are lower than clinical CT sources. Higher intensities are obtained from monoenergetic synchrotron sources, which are expensive currently. Many gamma-emitting radioisotopes also emit photons at several photon energies.

U.S. Pat No. 6,507,633 discloses a statistical method for reconstructing images from a single measured X-ray CT sinogram. That method was the first statistical approach to include a complete polyenergetic source spectrum model in a penalized-likelihood framework with a monotonically converging iterative algorithm. DeMan et al. in “An Iterative Maximum-Likelihood Polychromatic Algorithm for CT,” IEEE TR. MED. IM., 20(10):999-1008, October 2001 also proposed a solution to that problem based on a somewhat different object model and an algorithm that may not be monotonically converging. When only a single sinogram (for a given polyenergetic source spectrum) is available, usually one must make some fairly strong assumptions about the object's attenuation properties to perform reconstruction. For example, one may segment the object into soft tissue and bone voxels or mixtures thereof.

The energy dependence of attenuation coefficients is an inconvenience in conventional X-ray CT. Alvarez and Macovski, as disclosed in U.S. Pat No. 4,029,963, showed how to approximate the energy dependence of attenuation coefficients in terms of a Compton scattering component and a photoelectric absorption component (or, roughly equivalently, electron density and atomic number) and how to separate these two components in the sinogram domain prior to tomographic reconstruction. The separate component images could then be combined to synthesize a displayed CT image at any energy of interest. Later enhancements included noise suppression, considerations in basis material choices, energy optimization, beam-hardening assessment and correction, algorithm acceleration, scatter correction, and evaluation of precision.

Numerous potential applications of dual-energy imaging have been explored, including rock characterization for petrochemical industrial applications, soil sample analysis in agriculture, bone mineral density measurements, bone marrow composition, adipose tissue volume determinations, liver iron concentration, explosives detection, detection of contrast agents in spinal canal, non-destructive evaluation, body composition, carotid artery plaques, and radioactive waste drums. Accurate correction of Compton scatter in X-ray CT may also benefit from dual-energy information.

More recently, there has been considerable interest in using X-ray CT images to correct for attenuation in SPECT and PET image reconstruction. In these contexts, one must scale the attenuation values in the X-ray CT images and from the X-ray photon energies to the energies of the gamma photons used in SPECT and PET imaging. Kinahan et al. in “Attenuation Correction for a Combined 3D PET/CT Scanner,” MED. PHYS., 25(10):2046-53, October 1998 have noted that accurate scaling from X-ray to PET energies may require dual-energy X-ray CT scans. This is particularly challenging in the “arms down” mode of PET scanning. If the primary purpose of the dual-energy X-ray CT scan is PET attenuation correction (rather than diagnosis), then one would like to use low X-ray doses, resulting in the need for statistical image reconstruction methods to minimize image noise.

The conventional disadvantage of dual-energy methods is the increased scan time if two (or more) separate scans are acquired for each slice. This doubling in scan time can be avoided by methods such as alternating the source energy spectra between each projection angle or between each slice or conceivably in other arrangements. Special split detectors have also been proposed.

Prior to the 1990's, all work on dual-energy X-ray CT used the FBP reconstruction method. In the early 1990's, there were a few iterative methods published for dual-energy CT reconstruction. An iterative method to achieve beam-hardening correction and decomposition into basis materials is known. Markham and Fryar in “Element Specific Imaging in Computerized Tomography Using a Tube Source of X-Rays and a Low Energy-Resolution Detector System,” NUCL. INSTR. METH., A324(1):383-8, January 1993 applied the ART algorithm. Kotzki et al. in “Prototype of Dual Energy X-Ray Tomodensimeter for Lumbar Spine Bone Mineral Density Measurements; Choice of the Reconstruction Algorithm and First Experimental Results,” PHYS. MED. BIOL., 37(12):2253-65, December 1992 applied a conjugate gradient algorithm. These iterative approaches treat the problem as “finding the solution to a system of equations.” These algebraic approaches can improve the accuracy relative to FBP methods, but they do not directly address the radiation dose issue. In contrast, in statistical image reconstruction approaches, the problem is posed as finding the images that best fit the measurements according to the (possibly nonlinear) physical model and a statistical model. Proper statistical modeling can lead to lower noise images, thereby enabling reductions in X-ray dose to the patient.

Statistical approaches have been extensively investigated, particularly in the last ten years, for monoenergetic transmission measurements. Recently, Clinthorne and Sukovic have investigated iterative algorithms for dual-energy and triple-energy CT reconstruction based on a weighted least-squares approach, including object-domain constraints in the following papers:

“A Constrained Dual-Energy Reconstruction Method for Material-Selective Transmission Tomography,” NUCl. INSTR. METH. PHYS. RES. A., 351(1):347-8, December 1994;

“Design of an Experimental System for Dual-Energy X-Ray CT,” In PROC. IEEE NUC. SCI. SYMP. MED. IM. CONF., Vol. 2, pp. 1021-2, 1999; and

“Penalized Weighted Least-Squares Image Reconstruction in Single and Dual-Energy X-Ray Computed Tomography,” IEEE TR. MED. IM., 19(11):1075-81, November 2000.

That work assumed monoenergetic measurements. Gleason et al., in the paper “Reconstruction of Multi-Energy X-Ray Computer Tomography Images of Laboratory Mice,” IEEE TR. NUC. SCI., 46(2):1081-6, August 1999 hint at the need for ML solutions to the multi-energy problem. Table 1 summarizes the various dual-energy reconstruction methods:

TABLE 1 PRIOR ART DUAL-ENERGY X-RAY CT RECONSTRUCTION METHODS Reconstruction Data Algorithm Preprocessed Unprocessed FBP Alvarez & Macovski, 1976 — (many others since) Algebraic Kotzi et al., 1992 — Markham et al., 1993 Statistical Clinthorne & Sukovic (Monoenergetic) 2000

SUMMARY OF THE INVENTION

An object of the present invention is to provide a method for statistically reconstructing images from a plurality of transmission measurements having energy diversity and image reconstructor apparatus utilizing the method wherein, by using multiple measurements with “energy diversity,” i.e., a set of two or more energy spectra, one can avoid segmentation, eliminating one potential source of errors.

In carrying out the above object and other objects of the present invention, a method for statistically reconstructing images from a plurality of transmission measurements having energy diversity is provided. The method includes providing a plurality of transmission measurements having energy diversity. The method also includes processing the measurements with an algorithm based on a statistical model which accounts for the energy diversity to obtain at least one final component image which has reduced noise.

The method may further include providing a cost function based on the statistical model. The cost function may be minimized during the step of processing.

The cost function may have a gradient which is calculated during the step of processing. The gradient may be calculated by backprojecting.

The method may further include analyzing the at least one final component image.

The method may further include calibrating spectra of the measurements to obtain calibration data. The step of processing may utilize the calibration data.

The method may further include displaying the at least one final component image.

The gradient may be calculated by approximately using a subset of the measurements, such as an ordered subset of projection views, to accelerate the algorithm.

The cost function may have a regularizing penalty term.

The measurements may be dual-energy X-ray CT scans or may be transmission scans with differing energy spectra, such as X-ray sources with different tube voltages or different filtrations, or gamma-ray sources with multiple energies.

The cost function may include a log-likelihood term.

The cost function may consist solely of a log-likelihood function, which is called maximum likelihood reconstruction, or the cost function may consist of both a log-likelihood function and a regularizing penalty function, which is called penalized-likelihood or maximum a posteriori image reconstruction.

The method may further include preprocessing the measurements prior to the step of processing to obtain preprocessed measurements. The preprocessed measurements may be processed in the step of processing to obtain the at least one component image.

The log likelihood term may be a function that depends on a model for an ensemble mean of the transmission measurements, and the model incorporates characteristics of an energy spectrum.

The log-likelihood term may be a function of the transmission measurements, prior to any pre-processing such as taking a logarithm of the measurements.

The gradient of the cost function may be calculated using a parametric approximation, such as polynomials, tables, or piecewise polynomials.

The regularizing penalty term may be based on quadratic functions of linear combinations of voxel values or nonquadratic (edge-preserving) functions of such combinations.

Parameter constraints such as non-negativity of voxel values may be enforced during or after minimization of the cost function.

The processing step may be based on the preprocessed measurements and may use a cost function based on a statistical model for variability of the preprocessed measurements.

Further in carrying out the above objects and other objects of the present invention, an image reconstructor apparatus for statistically reconstructing images from a plurality of transmission measurements having energy diversity is provided. The apparatus includes means for providing a plurality of transmission measurements having energy diversity. The apparatus further includes means for processing the measurements with an algorithm based on a statistical model which accounts for the energy diversity to obtain at least one final component image which has reduced noise.

The apparatus may further include means for providing a cost function based on the statistical model, and the cost function may be minimized by the means for processing.

The apparatus may further include means for calculating a gradient of the cost function.

The means for calculating may calculate the gradient by backprojecting.

The apparatus may further include means for analyzing the at least one final component image.

The apparatus may further include means for calibrating spectra of the measurements to obtain calibration data, and the means for processing may utilize the calibration data.

The apparatus may further include a display for displaying the at least one final component image.

The means for calculating may calculate the gradient approximately using a subset of the measurements, such as an ordered subset of projection views, to accelerate the algorithm.

The cost function may have a regularizing penalty term.

The measurements may be transmission scans with differing energy spectra, such as X-ray sources with different tube voltages or different filtrations, or gamma-ray sources with multiple energies.

The cost function may include a log-likelihood term, or the cost function may consist of both a log-likelihood function and a regularizing penalty function, which is called penalized-likelihood or maximum a posteriori image reconstruction.

The cost function may include a maximum likelihood or penalized likelihood algorithm.

The apparatus may further include means for preprocessing the measurements to obtain preprocessed measurements. The preprocessed measurements may be processed by the means for processing to obtain the at least one component image.

The log likelihood term may be a function that depends on a model for an ensemble mean of the transmission measurements, and the model incorporates characteristics of an energy spectrum.

The log-likelihood term may be a function of the transmission measurements, prior to any pre-processing such as taking a logarithm of the measurements.

The gradient of the cost function may be calculated using a parametric approximation, such as polynomials, tables, or piecewise polynomials.

The regularizing penalty term may be based on quadratic functions of linear combinations of voxel values or nonquadratic (edge-preserving) functions of such combinations.

Parameter constraints such as non-negativity of voxel values may be enforced during or after minimization of the cost function.

The means for processing may process the preprocessed measurements and may use a cost function based on a statistical model for variability of the preprocessed measurements.

The above objects and other objects, features, and advantages of the present invention are readily apparent from the following detailed description of the best mode for carrying out the invention when taken in connection with the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1a is a block diagram flow chart illustrating a statistically polyenergetic reconstruction method of the present invention;

FIG. 1b is a schematic block diagram of a reconstruction apparatus of the present invention for use in a basic CT subsystem;

FIGS. 2a and 2 b are graphs which show X-Ray source spectra I_(m)(ε) used in a computer simulation; the dashed vertical lines are located at the effective energy {overscore (ε)}_(m) of each spectrum;

FIG. 3 is a graph which shows mass attenuation coefficients β_(l)(ε) of cortical bone and soft tissue used in computer simulation;

FIGS. 4a and 4 b are 3D graphs of functions ƒ_(im) in Equation (11) corresponding to FIG. 2 and FIG. 3; the units of s₁ are [cm²/g];

FIG. 5 is a scatter plot of (F₁*(s₁,s₂),F₂*(s₁,s₂))pairs of Equation (71) for uniformly-spaced (s₁,s₂) pairs; for a monoenergetic source these points would lie on a uniform grid; the nonlinearities are due to beam hardening effects;

FIGS. 6a-6 d illustrate a true object x^(true) used in computer simulation;

FIGS. 7a and 7 b show simulated dual-energy CT sinogram measurements Y_(mi);

FIGS. 8a and 8 b show estimates {circumflex over (ƒ)}_(im) computed from noisy simulated dual-energy measurements (i.e., smoothed and log-processed sinogram measurements);

FIGS. 9a-9 f show estimates ŝ_(i) computed from noisy simulated dual-energy measurements;

FIGS. 10a-10 f show FBP dual-energy reconstructions of soft tissue and bone components;

FIGS. 11a-11 f show penalized likelihood dual-energy reconstructions of soft tissue and bone components; the density units are 1/cm; and

FIG. 12 shows graphs of cost function decrease versus iteration n for 1-subset and 4-subset algorithms with precomputed denominator.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

In general, the method described herein is a novel extension of statistical image reconstruction approaches from the monoenergetic case to the case of measurements with energy diversity. A statistical (maximum likelihood or penalized likelihood) method for reconstructing an “attenuation map” μ({right arrow over (x)}, ε) from polyenergetic X-ray (or gamma-ray) tomographic measurements is described herein.

Like most dual-energy reconstruction methods, the method of the invention typically requires some knowledge about the X-ray beam spectrum. This spectrum can be measured directly or estimated from calibration phantoms. In the final analysis, rather than requiring the entire spectrum, the algorithm typically requires only the nonlinear function ƒ_(im) given in Equation (11) below and its derivative. It may be feasible to measure ƒ_(im) empirically for a given scanner. The method does not exhibit inordinate sensitivity to imperfections in the source spectrum model.

Physical Models

Let μ({right arrow over (x)}, ε) denote the object's linear attenuation coefficient as a function of spatial position {right arrow over (x )} and photon energy ε. The ideal tomographic imaging system would provide a complete description of μ for {right arrow over (x)} in the entire field of view and for a wide range of energies e. In practice, the goal is to reconstruct an estimate of a from a finite collection of “line-integral” measurements. (For simplicity, it is assumed that the object is static, and any temporal variations are ignored, although it is possible to generalize the method to the dynamic case).

A. General Measurement Physical Model

The following general physical model for the measurements is assumed. One collects transmission tomographic measurements with N_(s)≧21 different incident spectra, e.g., by changing the X-ray source voltage and/or the source filtration.

Alternately, one could use multiple energy windows in an energy-sensitive detector, such as in SPECT transmission scans with a multi-energy radioisotope. In these cases, there is really only one “incident spectrum,” but it is described as a collection of N_(s) different incident spectra since that is the usual framework in dual-energy X-ray CT.

For each incident spectra, one records tomographic “line integrals” at N_(d) radius-angle pairs, i.e., a sinogram is formed (not necessarily completely sampled). Let Y_(mi) denote the measurement for the ith ray for the mth incident spectrum, m=1, . . . , N_(s), i=1, . . . , N_(d). For notational simplicity, the case is presented where the same number of rays are recorded for each incident spectrum. The method generalizes easily to the case where the number or configuration of rays is different for different incident spectra, which may be useful in practice. One refers to {Y_(mi) }_(i=1) ^(N) ^(_(d)) as the measurements for the “mth incident spectrum.”

One assumes that the measurements are random variables with the following ensemble means: $\begin{matrix} {{{E_{\mu}\left\lbrack Y_{mi} \right\rbrack} = {{\overset{\_}{y}}_{mi}\overset{\Delta}{=}{{\int{I_{mi}\quad (ɛ)\quad \exp \quad \left( {- {\int_{L_{mi}}^{\quad}{\mu \quad \left( {\overset{\rightarrow}{x},ɛ} \right)\quad {l}}}} \right){ɛ}}} + r_{mi}}}},} & (1) \end{matrix}$

where ∫_(L) _(mi) ·dl dedenotes the “line integral” function for the ith position and the mth energy, and I_(mi)(ε) denotes the product of the source energy spectrum and the detector gain (for the mth incident spectrum), and r_(mi) denotes “known” additive background contributions such as room background, dark current, and/or scatter. This is an idealization since it ignores the nonlinearity caused by the exponential edge-gradient effect. One could extend the algorithm derivation to account for this effect, but for simplicity the polyenergetic aspects are focused on here. Typically L_(hd mi) will be independent of m, except in some systems where alternate projection views have different energy spectra. One treats each I_(mi)(ε) and r_(mi) as known and non-negative. Determining I_(mi)(ε) in practice may require careful calibration procedures. One usually determines r_(mi) by some preprocessing steps prior to iterative reconstruction. For example, the r_(mi)s may be equated to known constants related to the “shifted Poisson” approach based on detector noise models.

Methods are described herein for reconstructing p from tomographic measurements with energy diversity under log-likelihood models based on the general physical model (1). All previously published approaches have been based on simplifications of (1) or of the associated log-likelihoods with one exception. We first describe those “conventional” simplifications, and then proceed to describe the new approach.

B. Basis Material Decomposition (Object Model)

One has only a finite set of measurements whereas μ is a continuous function of energy and spatial location. Parametric statistical estimation requires some form of discretization of μ. For the polyenergetic case, one must parameterize both the spatial and energy dependencies. To our knowledge, all prior work has considered parameterizations that use basis functions that are separable in the spatial and energy (or material density) dimensions. Separable approaches seem simple and natural. For example, Alvarez and Macovski assume that $\begin{matrix} {{{\mu \quad \left( {\overset{\rightarrow}{x},ɛ} \right)} = {\sum\limits_{l = 1}^{L}\quad {f_{l}\quad (ɛ)\quad \alpha_{l}\quad \left( \overset{\rightarrow}{x} \right)}}},} & (2) \end{matrix}$

where each f_(l)(ε) depends only on energy but not on spatial position, β₁({right arrow over (x)}) is the corresponding coefficient that varies spatially, and L is usually 2. Alternatively, Clinthorne et al. assume that $\begin{matrix} {{{\mu \quad \left( {\overset{\rightarrow}{x},ɛ} \right)} = {\sum\limits_{l = 1}^{L}\quad {\beta_{l}\quad (ɛ)\quad \rho_{l}\quad \left( \overset{\rightarrow}{x} \right)}}},} & (3) \end{matrix}$

where β_(l)(ε) denotes the energy-dependent mass-attenuation coefficient of the lth material type (e.g., soft tissue, bone mineral, contrast agent, etc.), and ρ_(l)({right arrow over (x)}) is the density of that material at spatial location {right arrow over (x)}. This latter parameterization facilitates enforcing physical constraints such as non-negativity. Both of the above parameterizations use bases that are separable in space/energy. This separability property is needed for the type of algorithm derived in previous work. The more general algorithm derived in this paper does not require separability. A more general parameterization is described in (23) below after reviewing conventional approaches.

The conventional approach to dual-energy X-ray CT is to substitute (2) or (3) into (1). As described in detail hereinbelow, this yields a system of equations in the line integrals through the spatial basis functions. One can solve these equations numerically in sinogram space, and then perform FBP reconstruction to form images of the material components.

C. Conventional Monoenergetic Approximation

Another way to simplify (1) is to assume that each incident spectrum is monoenergetic. That model is realistic for some radioisotope sources, but is a considerable idealization of X-ray sources. Mathematically, the monoenergetic assumption is expressed

I _(mi)(ε)=I _(mi)δ(ε−ε_(m)),  (4)

where ε_(m) denotes the energy of the mth setting, m=1, . . . , N_(s). Under this assumption, the model (1) simplifies to

{overscore (y)} _(mi) =I _(mi)exp (−∫L _(mi) μ({right arrow over (x)}, ε_(m))dl)+r _(mi).  (5)

In this case, one can estimate the line integrals l_(mi)∫_(L) _(mi) β({right arrow over (x)},μ_(m))dl by a simple logarithm: $\begin{matrix} {{\hat{l}}_{mi}\overset{\Delta}{=}{{\log \quad \left( \frac{I_{mi}}{Y_{mi} - r_{mi}} \right)} \approx {\int_{L_{mi}}^{\quad}{\mu \quad \left( {\overset{\rightarrow}{x},ɛ_{m}} \right)\quad {{l}.}}}}} & (6) \end{matrix}$

Again, one could apply the FBP method to reconstruct μ({right arrow over (x)}, ε_(m)) from {l̂_(mi)}_(i = 1)^(N_(d)).

Clinthorne and Sukovic combined (6) with (3) to formulate a penalized weighted least-squares image reconstruction method for dual-energy and triple-energy tomographic reconstruction. Their simulations matched the monoenergetic model (4), so the question of whether a monoenergetic approximation is adequate for iterative dual-energy tomographic image reconstruction is an open one. The algorithm proposed herein will facilitate comparisons between the fuill polyenergetic treatment and the simpler monoenergetic approximation.

The case of a single monoenergetic measurement, i.e., N_(s)=1 in (4), is the most extensively studied tomographic reconstruction problem, and numerous non-statistical and statistical methods have been proposed for this case.

To estimate ii by iterative statistical methods, one must eventually parameterize it. In the single monoenergetic case, one usually assumes $\begin{matrix} {{\mu \quad \left( {\overset{\rightarrow}{x},ɛ_{1}} \right)} = {\sum\limits_{j = 1}^{N_{p}}\quad {b_{j}\quad \left( \overset{\rightarrow}{x} \right)\quad \mu_{j}}}} & (7) \end{matrix}$

for some spatial basis fuinctions b_(j)(•), such as indicator functions overreach pixel's support. Substituting into (5) yields $\begin{matrix} {{{\overset{\_}{y}}_{1i} = {{I_{1i}\quad \exp \quad \left( {- {\sum\limits_{j = 1}^{N_{p}}\quad {a_{ij}\quad \mu_{j}}}} \right)} + r_{1i}}},} & (8) \end{matrix}$

where

 a _(ij)ƒ_(L) _(i) b _(j)({right arrow over (x)})d.  (9)

The model (8) is used in “conventional” statistical methods for transmission image reconstruction.

D. Beam-Hardening Correction

U.S. Pat No. 6,507,633 discloses a method which combines (3) with the polyenergetic measurement model (1) in the single scan case (N_(s)=1) to develop a statistical method for X-ray CT image reconstruction with compensation for beam-hardening, assuming that the image can be segmented into soft-tissue and bone voxels. This same assumption is used in conventional non-statistical methods for beam-hardening correction. DeMan et al. proposed another statistical method for beam-hardening correction, assuming that all materials in the patient have spectral properties that are linear combinations of two basis materials. An advantage of energy diversity approaches (N_(s)>1) is that they eliminate the need for segmentation and other approximations that may hinder material characterization.

Preprocessing-Based Methods

Before describing the maximum-likelihood approach of the present invention in detail, two existing “preprocessing” approaches to dual-energy CT reconstruction are described. The first approach is the classical non-statistical method, and the second approach is the recently published weighted least-squares approach, including some extensions of that approach.

A. Conventional Dual-Energ Approach

Substituting (3) into (1) yields the following simplified model for the measurement means:

{overscore (y)} _(mi) =I _(mi) e ^(−ƒ) ^(_(im)) ^((s) ^(_(i)) ^((ρ))) +r _(mi)  (10)

$\begin{matrix} {{f_{im}\quad \left( s_{i} \right)}\overset{\Delta}{=}{{- \log}\quad \left( \frac{\int{I_{mi}\quad (ɛ)\quad \exp \quad \left( {{- \Sigma_{l}}\quad \beta_{l}\quad (ɛ)\quad s_{il}} \right){ɛ}}}{I_{mi}} \right)}} & (11) \\ {{s_{il}\overset{\Delta}{=}\left( {s_{il},\ldots \quad,s_{iL}} \right)}{{{s_{il}\quad (\rho)}\overset{\Delta}{=}{\int_{L_{mi}}^{\quad}{\rho_{l}\quad \left( \overset{\rightarrow}{x} \right)\quad {l}}}},}} & (12) \end{matrix}$

 s _(i)(s _(il), . . . ,s_(iL))

s _(il)(ρ)∫_(L) _(mi) ρ_(l)({right arrow over (x)})dl  (12)

for m=1, . . . ,N_(s) and l=1, . . . ,L, where the following total intensity is defined as: $\begin{matrix} {I_{mi}\overset{\Delta}{=}{\int{I_{mi}\quad (ɛ){{ɛ}.}}}} & (13) \end{matrix}$

Given noisy measurements Y_(mi), the natural approach to estimating the ƒ_(im)'s is to invert (10): $\begin{matrix} {{{\hat{f}}_{im}\overset{\Delta}{=}{{{- \log}\quad \left( {{smooth}\left\{ \frac{Y_{mi} - r_{mi}}{I_{mi}} \right\}} \right)} \approx {f_{im}\quad \left( s_{i} \right)}}},} & (14) \end{matrix}$

where often some radial smoothing is included to reduce noise. By a Taylor expansion, in the absence of smoothing the variance of these {circumflex over (ƒ)}_(im)'s is approximately: $\begin{matrix} {{{Var}\left\{ {\hat{f}}_{im} \right\}} \approx {\frac{{Var}\left\{ Y_{mi} \right\}}{\left( {{\overset{\_}{y}}_{mi} - r_{mi}} \right)^{2}}.}} & (15) \end{matrix}$

Ignoring measurement noise, in the usual case where L_(mi)=L_(i) is independent of m, one can view (14) as a system of N_(s) nonlinear equations in L unknowns for the ith ray, where the lth unknown is s_(il) defined in (12), namely the ith line integral through the lth basis material. If N_(s)≧L, then for each i, one can solve these nonlinear equations by iterative methods or by polynominal approximation. Mathematically, a natural strategy would be least squares: $\begin{matrix} {{{\hat{s}}_{i} = {\arg \quad {\min\limits_{s\quad ɛ\quad R^{L}}\quad {\sum\limits_{m = 1}^{N_{s}}\quad {w_{mi}\quad \frac{1}{2}\quad \left( {{\hat{f}}_{im} - {f_{im}\quad (s)}} \right)^{2}}}}}},} & (16) \end{matrix}$

for i=1, . . . ,N_(d), where w_(mi) is a weighting corresponding to the reciprocal of an estimate of the variance of {circumflex over (ƒ)}_(im). Based on (15), a reasonable choice is simply w_(mi)=Y_(mi) if the measurements are approximately Poisson and if the r_(mi)'s are small. Usually we have N_(s)=L, and in this case the above least-squares estimator degenerates into simply solving the system of equations (14), yielding estimates ŝ_(i) of the s_(i)'s of the following form: $\begin{matrix} {{{\hat{s}}_{i}\overset{\Delta}{=}{f_{i}^{- 1}\quad \left( {\hat{f}}_{i} \right)}},} & (17) \end{matrix}$

where $f_{i}\overset{\Delta}{=}{\left( {f_{i1},\ldots \quad,f_{{iN}_{s}}} \right).}$

This is the classical dual-energy “preprocessing” approach.

After having estimated the Ŝ_(i)'S in sinogram space, one must use these Ŝ_(i)'S to reconstruct images of the basis components (e.g., soft tissue and bone). The classical approach is to apply the FBP method separately to each sinogram {ŝ_(il)}_(i = 1)^(N_(d))

to form estimated component images {circumflex over (ρ)}_(l)({right arrow over (x)}). The FBP method usually yields unacceptable noisy estimates of the component images, hampering its acceptance. (Convex combinations of the component images have at best the same SNR as conventional X-ray CT images.)

To understand the source of this noise, it is instructive to analyze the noise properties of (17) or more generally (16). Using error propagation analysis, one can show that the covariance of ŝ₁ is approximately

Cov{ŝ _(i)}≈[(∇_(R)ƒ_(i))W _(i)(∇_(C)ƒ_(i))]⁻¹(∇_(R)ƒ_(i))W _(i) Cov{{circumflex over (ƒ)} _(i) }W _(i)(∇_(C)ƒ_(i))[(∇_(R)ƒ_(i))W _(i)(∇_(C)ƒ_(i))]⁻¹,  (18)

where W_(i)=diag{w_(mi)} and ∇_(R) and ∇_(C) denote row and column gradients, respectively. In the usual case where W_(i)=Cov{{circumflex over (ƒ)}_(i)}⁻¹ as described above, then this covariance simplifies to

Cov{ŝ _(i)}≈[(∇_(R)ƒ_(i))Cov{{circumflex over (ƒ)} _(i)}⁻¹(∇_(C)ƒ_(i))]⁻¹,  (19)

where one evaluates the gradients at the mean of ŝ_(i).

If the N_(s)×L matrix ∇_(R)ƒ_(i) had orthogonal columns, then the inversion step (17) would not amplify noise. But in practice ∇_(R)ƒ_(i) can be quite poorly conditioned, and examining its conditioning can provide insight into the challenges in dual-energy CT reconstruction. Note that $\left. {\left\lbrack {{\nabla_{C}\quad f_{i}}\quad (s)} \right\rbrack_{m\quad l} = {\frac{\partial\quad}{\partial s_{il}}\quad f_{im}\quad (s)}} \right) = {\frac{1}{{\overset{\_}{y}}_{mi} - r_{mi}}\quad {\int{I_{mi}\quad (ɛ)\quad \beta_{l}\quad (ɛ)\quad e^{{- \beta^{\prime}}\quad {(ɛ)}s}{{ɛ}.}}}}$

In particular,

[∇_(Cƒ) _(i)(s)]_(ml)|_(s=0)={overscore (β)}_(iml),

where the “effective” mass attenuation coefficient is defined as follows: $\begin{matrix} {{\overset{\_}{\beta}}_{iml}\overset{\Delta}{=}{\frac{\int{\beta_{l}\quad (ɛ)\quad I_{mi}\quad (ɛ){ɛ}}}{I_{mi}}.}} & (20) \end{matrix}$

One can explore the instability of dual-energy reconstruction by examining N_(s)×L matrix B_(i) with entries {overscore (β)}_(iml) for various material basis functions and source spectra.

As an example, for the data shown in FIGS. 2a and 2 b and FIG. 3, we compute: ${B_{i} = {\left\{ {\overset{\_}{\beta}}_{iml} \right\} = \begin{bmatrix} 0.264 & 0.655 \\ 0.199 & 0.309 \end{bmatrix}}},$

which has condition number 13.0. Furthermore, the “noise amplification” matrix [B_(i)′B_(i)]⁻¹ in (19) has diagonal entries that are roughly 10².

B. WLS Approaches

Instead of using FBP to reconstruct ρ_(l)({right arrow over (x)}) from the ŝ_(i)'s, an alternative is to use a statistically-motivated reconstruction method such as a penalized weighted least-squares (PWLS) cost function. This approach is similar to that proposed by Clinthorne and Sukovic. The source was assumed to be monoenergetic, so the logarithms of the measurements were used directly. Here, one can account for the polyenergetic spectrum by first estimating the ŝ_(i)'s using (16) or (17) in sinogram space. If the source is monoenergetic, then the two variations are equivalent. For polyenergetic sources, (16) or (17) will be more accurate than simple logarithms.

Consider the separable parameterization $\begin{matrix} {{\mu \quad \left( {\overset{\rightarrow}{x},ɛ} \right)} = {\sum\limits_{l = 1}^{L}\quad {\sum\limits_{j = 1}^{N_{p}}\quad {\beta_{l}\quad (ɛ)\quad b_{j}\quad \left( \overset{\rightarrow}{x} \right)\quad x_{lj}}}}} & (21) \end{matrix}$

where β_(l)(ε) is the mass attenuation coefficient of the lth material type and {b_(j)({right arrow over (x)})} are spatial basis functions. As before, suppose that L_(mi) is independent of m, i.e., the same rays are collected for each incident energy setting. Then the integral in (12) simplifies as follows: ${\int_{L_{mi}}^{\quad}{\rho_{l}\quad \left( \overset{\rightarrow}{x} \right)\quad {l}}} = {{s_{il}\quad \left( x_{l} \right)}\overset{\Delta}{=}{\sum\limits_{j = 1}^{N_{p}}\quad {a_{ij}\quad x_{lj}}}}$

where a_(ij) was defined in (9).

Having estimated the ŝ_(i)'s using (16) or (17) in sinogram space, one must then solve for x. A natural statistical approach is to use a PWLS criterion as follows: $\begin{matrix} {{x = {\arg \quad {\min\limits_{x\quad ɛ\quad R^{N_{p} \times L}}\quad {\Psi \quad (x)}}}}{{\Psi \quad (x)}\overset{\Delta}{=}{{\sum\limits_{i = 1}^{N_{d}}\quad {\frac{1}{2}\quad \left( {{\hat{s}}_{i} - {s_{i}\quad (x)}} \right)^{\prime}\quad W_{i}\quad \left( {{\hat{s}}_{i}\quad s_{i}\quad (x)} \right)}} + {R\quad (x)}}}} & (22) \end{matrix}$

where W_(i)εR^(L) is an estimate of the covariance of ${\hat{s}}_{i},{{s_{i}\quad (x)}\overset{\Delta}{=}\left( {{s_{il}\quad \left( x_{1} \right)},\ldots \quad,{s_{iL}\quad \left( x_{L} \right)}} \right)},$

and

R(x) is a regularization function. There are numerous iterative algorithms suitable for this quadratic minimization problem, such as coordinate descent, or conjugate gradients, or ordered-subsets approaches.

There are three natural choices for the covariance estimates W_(i). The simplest choice would be unweighted, where each W_(i) is simply the L×L identity matrix. Although the most convenient to use, this unweighted choice may provide relatively limited improvement over FBP which is also an unweighted method. The second choice would be to let W_(i) be diagonal with diagonal entries corresponding to estimates of the variances of the elements of ŝ_(i). This variance weighted approach would reduce noise by giving less weight to the noisier rays. The final approach would be to let W_(i) be a complete estimate of the L×L covariance matrix of ŝ_(i), as described in (18). If the source spectra were monoenergetic, then one can show that this latter approach is essentially equivalent to the method of Sukovic and Clinthorne. Thus, the method (22) is essentially a generalization to the polyenergetic case.

The disadvantage of the above PWLS methods relative to the ML approach described hereinbelow is that the nonlinear preprocessing that leads to the ŝ_(i)'s obscures their statistical distribution and seems to limit one to least-squares formulations. In contrast, the ML approach can use a very accurate statistical model. However, the ML approach is complicated by the nonlinearity of the physical model (1) and the generality of the statistical models considered.

Statistical Model and Likelihood

This section describes assumptions about the measurement statistics and formulates the log-likelihood.

A. Proposed Polyenergetic Object Model

As noted above, most prior work has considered bases that are separable in the spatial and energy (or material density) dimensions, as in (2) and (3). In the interest of generality here, the algorithm is derived under the following very flexible parameterization: $\begin{matrix} {{{\mu \left( {\overset{\rightarrow}{x},ɛ} \right)} = {\sum\limits_{k = 1}^{K_{b}}{{\chi_{k}\left( {\overset{\rightarrow}{x},ɛ} \right)}x_{k}}}},} & (23) \end{matrix}$

where K_(b) is the number of basis functions and x_(k) is the unknown coefficient of the kth basis function. By taking K_(b) sufficiently large and using suitably localized χ_(k)'s, any function μ can be approximated to arbitrary accuracy by (23). Both of the preceding parameterizations (2) and (3) are special cases of (23). For the usual two-material separable parameterization, we have K_(b)=2N_(p) is the number of voxels. A non-separable basis may be useful for example if a certain material component (such as a metal implant) is known a priori to be present only in certain image locations. This may be useful even for the bone-mineral component a priori segmentation can adequately identify the bone regions.

Using the general parameterization (23), the inner integral in (1) becomes: $\begin{matrix} {{\int_{L_{mi}}{{\mu \left( {\overset{\rightarrow}{x},ɛ} \right)}\quad {l}}} = \quad {\int_{L_{mi}}{\left\lbrack \quad {\sum\limits_{k = 1}^{K_{b}}{{\chi_{k}\left( {\overset{\rightarrow}{x},ɛ} \right)}x_{k}}} \right\rbrack {l}}}} \\ {= \quad {{\sum\limits_{k = 1}^{K_{b}}{{a_{mik}(ɛ)}x_{k}}}\overset{\Delta}{=}\left\lbrack {{A_{m}(ɛ)}x} \right\rbrack_{i}}} \end{matrix}$

where the coefficient vector ${x\overset{\Delta}{=}\left( {{x_{1}\ldots},x_{K_{b}}} \right)},$

and where A_(m)(ε) is a N_(d)×K_(b) matrix with elements ${\left\lbrack {A_{m}(ɛ)} \right\rbrack_{ik} = {{a_{mik}(ɛ)}\overset{\Delta}{=}{\int_{L_{mi}}{{\chi_{k}\left( {\overset{\rightarrow}{x},ɛ} \right)}\quad {l}}}}},$

for i=1, . . . ,N_(d), k=1, . . . ,K_(b). Substituting into (1) yields the following discrete-object discrete-data mean model:

ŷ _(mi)(x)=∫I _(mi)(ε)e ^(−[A) ^(_(m)) ^(_((ε)x])) ^(_(i)) dε+r _(mi).  (24)

In the absence of noise, our goal would be to estimate x from the measurements {Y_(mi)} using the model (24).

B. Statistical Methods

If one used photon-counting detectors with modest deadtimes, then it would be reasonable to assume that the measurements are statistically independent Poisson random variables with means (1), i.e.,

Y _(mi)˜Poisson {{overscore (y)} _(mi) [x]}.

In this case, for a given measurement realization Y_(mi)=y_(mi), the corresponding negative log-likelihood of x has the form ${{- {L(x)}} \equiv {{\sum\limits_{m = 1}^{N_{s}}{\sum\limits_{i = 1}^{N_{d}}{{\overset{\_}{y}}_{mi}(x)}}} - {y_{mi}\log \quad {{\overset{\_}{y}}_{mi}(x)}}}},$

where≡means “equal to within irrelevant constants independent of x.” This is the model used in most statistical image reconstruction methods for transmission tomography to date and it is natural for photon-counting detectors such as those used in PET and SPECT transmission scans.

Although photon-counting X-ray detectors do exist, commercial X-ray CT systems use current integrating detectors that yield energy-dependent signals and additional electronic noise variance beyond that due to Poisson counting variability. To first order, additive electronic noise can be approximated within the Poisson model using the r_(mi) terms in (1) by a simple modification of the “shifted Poisson” approach. It is likely that the “exact” likelihood for such detectors is analytically intractable, so approximations will undoubtably be used in practice. For example, Clinthorne describes a sophisticated point-process model for X-ray detection and uses its first and second moments. Rather than postulating and attempting to validate any particular approximate statistical model in this application, the algorithms are derived under very general assumptions that will accommodate a wide variety of log-likelihood models and approximations that might be proposed in the future.

The following four assumptions are made about the measurement statistics.

1. The measurements {Y_(imi)} are statistically independent. Due to effects like scintillator afterglow and electronics lag, statistical independence may not hold exactly in practice, but it is likely to be an accurate approximation for most X-ray CT systems. Accounting for whatever statistical dependencies may be present in real systems would likely be quite challenging.

2. The marginal negative log-likelihood of Y_(mi) has the form ψ_(mi)({overscore (y)}_(mi)(x)) for some scalar function ψ_(mi), For example, if the measurements have Poisson distributions, then

ψ_(mi)(y)=y−y _(mi) logy.  (25)

 This is perhaps the simplest case, but we allow for much more general ψ_(mi)'s in the derivation.

3. The final two assumptions are more technical and concern the existence of convenient surrogate functions for the ψ_(mi)'s of interest. It is believed that all physically plausible ψ_(mi)'s will satisfy these quite general assumptions. They are certainly satisfied for Poisson and Gaussian statistical models.

 For each ψ_(mi), it is assumed that there exists a corresponding scalar surrogate function h_(mi)(•,•) that is convex on (0,∞) in its first argument. By surrogate function, it is meant a function that satisfies

h _(mi)(y,y)=ψ _(mi)(y), ∀y>0  (26)

h _(mi)(y,z)≧ψ_(mi) _(mi)(y), ∀y>0.  (27)

 These conditions are the key to deriving an iterative algorithm that monotonically decreases the cost function defined below. For each z>0, it is also assumed that h_(mi)(•,z) is differentiable in its first argument in an open interval around z. This assumption, combined with (26) and (27), ensures the following tangent condition:

{dot over (h)} _(mi)(z,z)={dot over (∀)}_(mi)(z), ∀z>0,  (28)

 where ${{\overset{.}{h}}_{mi}\left( {y,z} \right)}\overset{\Delta}{=}{\frac{\partial}{\partial y}{{h\left( {y,z} \right)}.}}$

4. Convexity alone may be sufficient for some types of iterative minimization algorithms. However, to enable use of very simple descent methods, we will find parabolic surrogates. The following assumption ensures that the necessary parabola exists, which it certainly does in the Poisson case among others.

 For any x≧0, the function: $\begin{matrix} {{g_{mi}\left( {l,x,ɛ} \right)}\overset{\Delta}{=}{h_{mi}\left( {{{{b_{mi}\left( {x,ɛ} \right)}e^{- l}} + {r_{mi}\left( {x,ɛ} \right)}},{{\overset{\_}{y}}_{mi}(x)}} \right)}} & (29) \end{matrix}$

 is assumed to have a quadratic surrogate for l≧0, where the following functions are defined for later use: $\begin{matrix} {{b_{mi}\left( {x,ɛ} \right)}\overset{\Delta}{=}{{{\overset{\_}{y}}_{mi}(x)}/{t_{mi}\left( {x,ɛ} \right)}}} & (30) \\ {{t_{mi}\left( {x,ɛ} \right)}\overset{\Delta}{=}{{\exp\left( {- \left\lbrack {{A_{mi}(ɛ)}x} \right\rbrack_{i}} \right)} + {r_{mi}/I_{mi}}}} & (31) \\ {{r_{mi}\left( {x,ɛ} \right)}\overset{\Delta}{=}{{b_{mi}\left( {x,ɛ} \right)}{r_{mi}/{I_{mi}.}}}} & (32) \end{matrix}$

In other words, it is assumed that there exists a curvature function c_(mi)(x,ε) such that the following parabola is a surrogate for g_(mi): $\begin{matrix} {{{q_{mi}\left( {l,x,ɛ} \right)} = {{g_{mi}\left( {\left\lbrack {{A_{m}(ɛ)}x} \right\rbrack_{i},x,ɛ} \right)} + {{{\overset{.}{g}}_{mi}\left( {\left\lbrack {{A_{m}(ɛ)}x} \right\rbrack_{i},x,ɛ} \right)}\left( {l - \left\lbrack {{A_{m}(ɛ)}x} \right\rbrack_{i}} \right)} + {\frac{1}{2}{c_{mi}\left( {x,ɛ} \right)}\left( {l - \left\lbrack {{A_{m}(ɛ)}x} \right\rbrack_{i}} \right)^{2}}}},} & (33) \end{matrix}$

where ${{\overset{.}{g}}_{mi}\left( {l,x,ɛ} \right)}\overset{\Delta}{=}{\frac{\partial}{\partial l}{{g_{mi}\left( {l,x,ɛ} \right)}.}}$

In assuming that q_(mi) is a surrogate for g_(mi), it is meant that c_(mi) is such that

q _(mi)(l,x,ε)≧g _(mi)(l,x,ε), ∀x≧0, ∀l≧0.  (34)

The construction (33) provides the following two surrogate properties: ${{{{{q_{mi}\left( {l,x,ɛ} \right)}}_{l = {\lbrack{{A_{m}{(ɛ)}}x}\rbrack}_{i}} = {g_{mi}\left( {\left\lbrack {{A_{m}(ɛ)}x} \right\rbrack_{i},x,ɛ} \right)}}{{\overset{.}{q}}_{mi}\left( {l,x,ɛ} \right)}}}_{l = {\lbrack{{A_{m}{(ɛ)}}x}\rbrack}_{i}} = {{{\overset{.}{g}}_{mi}\left( {\left\lbrack {{A_{m}(ɛ)}x} \right\rbrack_{i},x,ɛ} \right)}.}$

B.1 Existence of Convex Surrogates

The existence of a differentiable convex surrogate h_(mi) satisfying (26) and (27) always holds when ψ_(mi) is twice differentiable, which it will always be for physically plausible statistical models.

Let ψ(y) be any twice differentiable function and define $\begin{matrix} {{h\left( {y,z} \right)} = {{\psi (z)} + {{\overset{.}{\psi}(z)}\left( {y - z} \right)} + {\int_{z}^{y}{\left( {y - \tau} \right)\max \left\{ {{\overset{¨}{\psi}(\tau)},0} \right\} \quad {{\tau}.}}}}} & (35) \end{matrix}$

This surrogate h is convex and (twice) differentiable and satisfies (26) and (27). The construction (35) may not be the optimal surrogate in terms of convergence rate, but it confuims that the third assumption above is unrestrictive.

Of course, if ψ_(mi) is itself convex, such as in the Poisson case, then one simply takes h_(mi)(y,•)=ψ_(mi)(y).

B.2 Existence of Parabola Surrogates

To derive a specific algorithm for a particular negative log-likelihood ψ_(mi) one will need to determine the c_(mi) function in (33) by careful analysis. In the case of Poisson measurements, where ψ_(mi)(y)=h_(mi)(y,•)=y−y_(mi) log y, the optimal c_(mi) function was shown to be: $\begin{matrix} {{{c_{mi}^{opt}\left( {x,ɛ} \right)}\overset{\Delta}{=}{c_{opt}\left( {\left\lbrack {{A_{m}(ɛ)}x} \right\rbrack_{i},y_{mi},{b_{mi}\left( {x,ɛ} \right)},{r_{mi}\left( {x,E} \right)}} \right)}},} & (36) \end{matrix}$

where $\begin{matrix} {{c_{opt}\left( {l,y,b,r} \right)} = \left\{ \begin{matrix} {\left\lbrack {{- 2}\frac{z\left( {l,y,b,r} \right)}{l^{2}}} \right\rbrack_{+},{l > 0}} \\ {\left\lbrack {- {\overset{¨}{g}\left( {l,y,b,r} \right)}} \right\rbrack_{+},{l = 0},} \end{matrix} \right.} & (37) \end{matrix}$

where $\begin{matrix} {{{z\left( {l,y,b,r} \right)}\overset{\Delta}{=}{{g\left( {0,y,b,r} \right)} - {g\left( {l,y,b,r} \right)} + {{\overset{.}{g}\left( {l,y,b,r} \right)}l}}}{{g\left( {l,y,b,r} \right)} = {\left( {{be}^{- l} + r} \right) - {y\quad {\log \left( {{be}^{- l} + r} \right)}}}}} & (38) \\ {{\overset{.}{g}\left( {l,y,b,r} \right)} = {{\frac{\partial}{\partial l}g} = {\left\lbrack {1 - \frac{y}{{be}^{- l} + r}} \right\rbrack \left( {- 1} \right){be}^{- l}}}} & (39) \\ {{\overset{¨}{g}\left( {l,y,b,r} \right)} = {{\frac{\partial^{2}}{\partial l^{2}}g} = {\left\lbrack {1 - \frac{y\quad r}{\left( {{be}^{- l} + r} \right)^{2}}} \right\rbrack {{be}^{- l}.}}}} & (40) \end{matrix}$

By “optimal” is meant the choice the leads to the fastest convergence.

It was also shown that the curvature choice (37) is optimal not only for Poisson measurements, but also for a fairly broad fa_(mi)ly of negative log-likelihoods.

Alternatively, if g_(mi) has bounded curvature, then one could use the upper bound on that curvature as the choice for c_(mi). This approach was called “maximum curvature.” It is the simplest choice, but is suboptimal in terms of convergence rate. To summarize, assuming existence of parabola surrogates should not unduly restrict the class of statistical models.

C. Likelihood Formulation

Under the above assumptions, including statistical independence of the transmission measurements, the negative log-likelihood corresponding to the above physical model has the form $\begin{matrix} {{{- {L(x)}} \equiv {\Psi (x)}}\overset{\Delta}{=}{\sum\limits_{m = 1}^{N_{s}}{\sum\limits_{i = 1}^{N_{d}}{\psi_{mi}\left( {{\overset{\_}{y}}_{mi}(x)} \right)}}}} & (41) \end{matrix}$

for some scalar functions ψ_(mi) that depend on the selected statistical model. Our goal is to estimate the coefficient vector x from the measurements {Y_(mi)} by maximizing the log-likelihood or equivalent by finding a minimizer of the cost function Ψ (or a regularized version thereof): ${\hat{x}}_{ML}\overset{\Delta}{=}{\arg \quad {\min\limits_{x}\quad {{\Psi (x)}.}}}$

Optimization is restricted to the valid parameter space (i.e., including non-negativity constraints, etc.). Ignoring any constraints, in principle one could find a minimizer by zeroing the following partial derivatives of the cost function: $\begin{matrix} \begin{matrix} {{\frac{\partial}{\partial x_{k}}{\Psi (x)}} = \quad {\sum\limits_{m = 1}^{N_{s}}{\sum\limits_{i = 1}^{N_{d}}{{{\overset{.}{\psi}}_{mi}\left( {{\overset{\_}{y}}_{mi}(x)} \right)}\frac{\partial}{\partial x_{k}}{{\overset{\_}{y}}_{mi}(x)}}}}} \\ {= \quad {\sum\limits_{m = 1}^{N_{s}}{\sum\limits_{i = 1}^{N_{d}}{{{\overset{.}{\psi}}_{mi}\left( {{\overset{\_}{y}}_{mi}(x)} \right)} \cdot}}}} \\ {\quad {{\left( {- 1} \right){\int{{I_{mi}(ɛ)}{a_{mik}(ɛ)}e^{- {\lbrack{{A_{m}{(ɛ)}}x}\rbrack}_{i}}{ɛ}}}},}} \end{matrix} & (42) \end{matrix}$

where ${{\overset{.}{\psi}}_{mi}(y)}\overset{\Delta}{=}{\frac{}{y}{{\psi_{mi}(y)}.}}$

In general, there is no closed form solution to the set of K_(b) equations (42), so iterative algorithms are required.

Although many algorithms have been proposed for the monoenergetic problem (8), none of those previously proposed algorithms is suitable for minimizing the cost function Ψ(x) in the polyenergetic case. The greatest difficulty is the integral over energy in (24). Substituting a summation for this integral does not significantly simplify the problem. Further difficulties arise due to the nonlinearity of Beer's law in (1), and due to the nonquadratic form of typical choices for ψ_(mi) (cf. (25)). In the next section, optimization transfer principles are applied to derive an iterative algorithm that monotonically decreases the cost function of each iteration. It should converge to a local minimizer, and should converge to the global minimizer if the cost function is unimodal. (The cost function is convex in the monoenergetic case under the Poisson model if the r_(li)'s are zero.) Global convergence needs further exa_(mi)nation. Many variations on this basic algorithm are possible. In particular, one could apply many general purpose minimization methods to minimize ψ, but most such methods would not provide monotonic decreases.

ML Algorithm

Since the cost function Ψ(x) is difficult to minimize directly, optimization transfer principles are applied to develop an algorithm that monotonically decreases Ψ(x) each iteration. (The extension to the penalized-likelihood case is straightforward, so the ML case is focused on here. The challenging part is the log-likelihood, not the penalty function.) To apply optimization transfer, at the nth iteration, one would like to find an easily-minimized surrogate function (the superscript “n” denotes an iteration index, not a power) φ(x,x^((n)))=φ^((n))(x) that satisfies the majorizing conditions

φ^((n))(x ^((n)))=Ψ(x ^((n)))  (43)

φ^((n))(x)≧Ψ(x).  (44)

One then implements the following iteration: $\begin{matrix} {x^{({n + 1})} = {\arg \quad {\min\limits_{x}\quad {{\varphi^{(n)}(x)}.}}}} & (45) \end{matrix}$

Optimization is restricted to the valid parameter space, such as {x≧0}. The conditions (43) and (44) ensure that an algorithm of the form (45) will monotonically decrease Ψ each iteration: Ψ(x^((n+1)))≧Ψ(x^((n))).

A suitable surrogate function is derived in this section by using several optimization transfer principles. DePierro's multiplicative convexity trick, parabola surrogates, and DePierro's additive convexity trick. The development partially parallels the derivation of a monotonic algorithm for SPECT transmission scans with overlapping beams. As described herein, the overlap is spectral rather than spatial. The final result of the derivation is the diagonally-scaled gradient descent algorithm (62) below.

A. Convex Surrogate

In general, Ψ(x) will not be convex, so a convex surrogate is first formed to simplify minimization. Using the convex surrogate h_(mi) described in (27), one defines $\begin{matrix} {{{h_{mi}^{(n)}(y)}\overset{\Delta}{=}{h_{mi}\left( {y,{{\overset{\_}{y}}_{mi}\left( x^{(n)} \right)}} \right)}},} & (46) \end{matrix}$

and then constructs the following surrogate function for Ψ: $\begin{matrix} {{\varphi_{0}^{(n)}(x)}\overset{\Delta}{=}{\sum\limits_{m = 1}^{N_{s}}{\sum\limits_{i = 1}^{N_{d}}{{h_{mi}^{(n)}\left( {{\overset{\_}{y}}_{mi}(x)} \right)}.}}}} & (47) \end{matrix}$

It follows from (26) and (27) that the surrogate φ₀ ^((n)) satisfies the monotonicity conditions (43) and (44).

B. Surrogate Based on Multiplicative Convexity Trick

Typically, φ₀ ^((n)) is also difficult to minimize directly, so the next step is to further simplify by deriving a surrogate function using DePierro's multiplicative convexity trick, generalized from summations to integrals. One first rewrites {overscore (y)}_(mi) in (24) as follows: $\begin{matrix} \begin{matrix} {{{\overset{\_}{y}}_{mi}(x)} = \quad {\int{{I_{mi}(ɛ)}{t_{mi}\left( {x,ɛ} \right)}{ɛ}}}} \\ {{= \quad {\int{\left\lbrack \frac{I_{mi}(ɛ)}{b_{mi}^{(n)}(ɛ)} \right\rbrack {t_{mi}\left( {x,ɛ} \right)}{b_{mi}^{(n)}(ɛ)}{ɛ}}}},} \end{matrix} & (48) \end{matrix}$

where t_(mi) and b_(mi) were defined in (31) and (30), and we define $\begin{matrix} {{{t_{mi}^{(n)}(ɛ)}\overset{\Delta}{=}{t_{mi}\left( {x^{(n)},ɛ} \right)}}{{b_{mi}^{(n)}(ɛ)}\overset{\Delta}{=}{{b_{mi}\left( {x^{(n)},ɛ} \right)}.}}} & (49) \end{matrix}$

(Many b_(mi) ^((n))(ε)'s would satisfy (50) and hence (44), but only the choice (49) leads to (43)). The key feature of the equality (48) is that the terms in the brackets are nonnegative and integrate to unity, enabling use of the convexity inequality (i.e., Jensen's inequality). If h is any function that is convex on (0,∞), then $\begin{matrix} \begin{matrix} {{h\left( {{\overset{\_}{y}}_{mi}(x)} \right)} = \quad {h\left( {\int{\left\lbrack \frac{I_{mi}(ɛ)}{b_{mi}^{(n)}(ɛ)} \right\rbrack {t_{mi}\left( {x,ɛ} \right)}{b_{mi}^{(n)}(ɛ)}{ɛ}}} \right)}} \\ {\leq \quad {\int{\frac{I_{mi}(ɛ)}{b_{mi}^{(n)}(ɛ)}{h\left( {{t_{mi}\left( {x,ɛ} \right)}{b_{mi}^{(n)}(ɛ)}} \right)}{{ɛ}.}}}} \end{matrix} & (50) \end{matrix}$

Since h_(mi) ^((n)) in (47) is convex by assumption, one can apply the above trick to it, thereby deriving our next surrogate function for the cost function as follows: $\begin{matrix} \begin{matrix} {{\varphi_{0}^{(n)}(x)} = \quad {\sum\limits_{m = 1}^{N_{s}}{\sum\limits_{i = 1}^{N_{d}}{h_{mi}^{(n)}\left( {{\overset{\_}{y}}_{mi}(x)} \right)}}}} \\ {\leq \quad {\underset{m = 1}{\overset{N_{s}}{\int\sum}}{\sum\limits_{i = 1}^{N_{d}}{\frac{I_{mi}(ɛ)}{b_{mi}^{(n)}(ɛ)}{h_{mi}^{(n)}\left( {{t_{mi}\left( {x,ɛ} \right)}{b_{mi}^{(n)}(ɛ)}} \right)}{ɛ}}}}} \\ {\overset{\Delta}{=}\quad {{\varphi_{1}^{(n)}(x)}.}} \end{matrix} & (51) \end{matrix}$

Using the equality t_(mi)(x^((n)),ε)b_(mi) ^((n))(ε)={overscore (y)}_(mi)(x^((n))), one can verify that φ₁ ^((n)) is a valid surrogate that satisfies the monotonicity conditions (43) and (44).

As a sanity check, in the case of monoenergetic sources we have I_(mi)(ε)=I_(m)δ(ε−ε_(m)), for which t_(mi)(x,ε_(m))={overscore (y)}_(mi)(x)/I_(m) and hence b_(mi) ^((n))(ε)=I_(m). In this case φ₁ ^((n))(x)=φ₀ ^((n))(x). So the monoenergetic problem is a degenerate special case of the algorithm derived hereafter.

The surrogate function φ₁ ^((n)) “brings the integral over ε to the outside,” simplifying the optimization. Nevertheless, φ₁ ^((n)) is difficult to minimize directly, so we find a paraboloidal surrogate function for it.

C. Paraboloidal Surrogate

The next surrogate is based on paraboloids. For brevity, define ${{r_{mi}^{(n)}(ɛ)}\overset{\Delta}{=}{r_{mi}\left( {x^{(n)},ɛ} \right)}},$

where r_(mi) was defined in (32). Using (29)-(31) we have: $\begin{matrix} {{h_{mi}^{(n)}\left( {{t_{mi}\left( {x,ɛ} \right)}{b_{mi}^{(n)}(ɛ)}} \right)}\begin{matrix} {= \quad {h_{mi}\left( {{{{b_{mi}^{(n)}(ɛ)}{\exp \left( {- \left\lbrack {{A_{m}(ɛ)}x} \right\rbrack_{i}} \right)}} + {r_{mi}^{(n)}(ɛ)}},{{\overset{\_}{y}}_{mi}\left( x^{(n)} \right)}} \right)}} \\ {{= \quad {g_{mi}^{(n)}\left( {\left\lbrack {{A_{m}(ɛ)}x} \right\rbrack_{i},ɛ} \right)}},} \end{matrix}} & (52) \end{matrix}$

where g_(mi) was defined in (29) and ${g_{mi}^{(n)}\left( {l,ɛ} \right)}\overset{\Delta}{=}{{g_{mi}\left( {l,x^{(n)},ɛ} \right)}.}$

The surrogate φ₁ ^((n)) is rewritten in (51) as follows: $\begin{matrix} {{\varphi_{1}^{(n)}(x)} = {\sum\limits_{m = 1}^{N_{s}}{\sum\limits_{i = 1}^{N_{d}}{\int{\frac{I_{mi}(ɛ)}{b_{mi}^{(n)}(ɛ)}{g_{mi}^{(n)}\left( {\left\lbrack {{A_{m}(ɛ)}x} \right\rbrack_{i},ɛ} \right)}{{ɛ}.}}}}}} & (53) \end{matrix}$

As described hereinabove, it is assumed that g_(mi) has a parabola surrogate q_(mi) of the form (33), so these parabolas are combined using (53) to form a paraboloidal surrogate: $\begin{matrix} \begin{matrix} {{\varphi_{1}^{(n)}(x)} \leq \quad {\sum\limits_{m = 1}^{N_{s}}{\sum\limits_{i = 1}^{N_{d}}{\int{\frac{I_{mi}(ɛ)}{b_{mi}^{(n)}(ɛ)}{q_{mi}^{(n)}\left( {\left\lbrack {{A_{m}(ɛ)}x} \right\rbrack_{i},ɛ} \right)}{ɛ}}}}}} \\ {{\overset{\Delta}{=}\quad {\varphi_{2}^{(n)}(x)}},} \end{matrix} & (54) \end{matrix}$

where ${q_{mi}^{(n)}\left( {l,ɛ} \right)}\overset{\Delta}{=}{{q_{mi}\left( {l,x^{(n)},ɛ} \right)}.}$

Using (33) and (34), one canverify that φ₂ ^((n)) is a valid surrogate that satisfies the monotonicity conditions (43) and (44).

φ₂ ^((n)) is a quadratic form, so one could apply any of a variety of algorithms to minimize it per (45). Two choices are focused on. One choice is coordinate descent, which is known to converge rapidly. However, coordinate descent works best when the a_(mik)(ε)'s are precomputed and stored. This storage may require more memory than current workstations allow for X-ray CT sized problems. The second choice is to find yet another surrogate, this time a separable function for which the minimization step (45) becomes trivial.

D. Coordinate Descent

Implementing a coordinate descent algorithm to minimize the paraboloidal surrogate φ² ^((n)) would require the following partial derivatives: $\begin{matrix} \begin{matrix} {{\frac{\partial}{\partial x_{k}}{\varphi_{2}^{(n)}(x)}} = \quad {\sum\limits_{m = 1}^{N_{s}}{\sum\limits_{i = 1}^{N_{d}}{\int{{a_{mik}(ɛ)}\frac{I_{mi}(ɛ)}{b_{mi}^{(n)}(ɛ)}{{\overset{.}{q}}_{mi}^{(n)}\left( {\left\lbrack {{A_{m}(ɛ)}x} \right\rbrack_{i},ɛ} \right)}{ɛ}}}}}} \\ {= \quad {\sum\limits_{m = 1}^{N_{s}}{\sum\limits_{i = 1}^{N_{d}}{\int{{a_{mik}(ɛ)}{\frac{I_{mi}(ɛ)}{b_{mi}^{(n)}(ɛ)}\left\lbrack {{{\overset{.}{q}}_{mi}^{(n)}\left( {{l_{mi}^{(n)}\left( \overset{.}{ɛ} \right)},ɛ} \right)} +} \right.}}}}}} \\ {{\left. \quad {{{\overset{.}{c}}_{mi}^{(n)}(ɛ)}\left( {\left\lbrack {{A_{m}(ɛ)}x} \right\rbrack_{i} - {l_{mi}^{(n)}(ɛ)}} \right)} \right\rbrack {ɛ}},} \end{matrix} & (55) \end{matrix}$

where we define $\begin{matrix} {{l_{mi}^{(n)}(ɛ)}\overset{\Delta}{=}{{\left\lbrack {{A_{m}(ɛ)}x^{(n)}} \right\rbrack_{i}\quad {and}\quad {c_{mi}^{(n)}(ɛ)}}\overset{\Delta}{=}{{c_{mi}\left( {x^{(n)},ɛ} \right)}.}}} & (56) \end{matrix}$

Applying the chain rule to (29) yields:

{dot over (g)} _(mi) ^((n))(l, ε)={dot over (h)}_(mi) ^((n))(b _(mi) ^((n))(ε)e ^(−l) +r _(mi) ^((n))(ε))b _(mi) ^((n))(ε)(−1)e ^(−l),

so using (28): $\begin{matrix} {{{{\overset{.}{g}}_{mi}^{(n)}\left( {l,ɛ} \right)}}_{l = {l_{mi}^{(n)}{(ɛ)}}} = \quad {{{\overset{.}{h}}_{mi}\left( {{{b_{mi}^{(n)}(ɛ)}e^{- {l_{mi}^{(n)}{(ɛ)}}}} + {r_{mi}^{(n)}(ɛ)}} \right)}{b_{mi}^{(n)}(ɛ)}\left( {- 1} \right)e^{- {l_{mi}^{(n)}{(ɛ)}}}}} \\ {{= \quad {{{\overset{.}{\psi}}_{mi}\left( {{\overset{\_}{y}}_{mi}\left( x^{(n)} \right)} \right)}{b_{mi}^{(n)}(ɛ)}\left( {- 1} \right)e^{- {l_{mi}^{(n)}{(ɛ)}}}}},} \end{matrix}$

and hence $\begin{matrix} {{\frac{\partial}{\partial x_{k}}{\varphi_{2}^{(n)}(x)}} = \quad {{\sum\limits_{m = 1}^{N_{s}}{\sum\limits_{i = 1}^{N_{d}}{\int{{a_{mik}(ɛ)}\frac{I_{mi}(ɛ)}{b_{mi}^{(n)}(ɛ)}{{\overset{.}{g}}_{mi}^{(n)}\left( {{l_{mi}^{(n)}(ɛ)},ɛ} \right)}{ɛ}}}}} +}} \\ {\quad {\sum\limits_{m = 1}^{N_{s}}{\sum\limits_{i = 1}^{N_{d}}{\int{{a_{mik}(ɛ)}{{w_{mi}^{(n)}(ɛ)}\left\lbrack {{A_{m}(ɛ)}\left( {x - x^{(n)}} \right)} \right\rbrack}_{i}{ɛ}}}}}} \\ {= \quad {{\sum\limits_{m = 1}^{N_{s}}{\sum\limits_{i = 1}^{N_{d}}{{{\overset{.}{\psi}}_{mi}\left( {{\overset{\_}{y}}_{mi}\left( x^{(n)} \right)} \right)}{\int{{a_{mik}(ɛ)}{I_{mi}(ɛ)}\left( {- 1} \right)e^{- {\lbrack{{A_{m}{(ɛ)}}x^{(n)}}\rbrack}_{i}}{ɛ}}}}}} +}} \\ {\quad {\sum\limits_{m = 1}^{N_{s}}{\sum\limits_{i = 1}^{N_{d}}{\int{{a_{mik}(ɛ)}{{w_{mi}^{(n)}(ɛ)}\left\lbrack {{A_{m}(ɛ)}\left( {x - x^{(n)}} \right)} \right\rbrack}_{i}}}}}} \\ {{{= \quad {\frac{\partial}{\partial x_{k}}{\Psi (x)}}}}_{x = x^{(n)}} +} \\ {\quad {{\sum\limits_{m = 1}^{N_{s}}{\sum\limits_{i = 1}^{N_{d}}{\sum\limits_{j = 1}^{N_{p}}{\int{{a_{mik}(ɛ)}{a_{mij}(ɛ)}{w_{mi}^{(n)}(ɛ)}{{ɛ\left( {x_{j} - x_{j}^{(n)}} \right)}}}}}}},}} \end{matrix}$

using (42), where ${w_{mi}^{(n)}(ɛ)}\overset{\Delta}{=}{\frac{I_{mi}(ɛ)}{b_{mi}^{(n)}(ɛ)}{{c_{mi}^{(n)}(ɛ)}.}}$

This “matched derivative” property is inherent to optimization transfer methods.

For the second partial derivatives, using (54) and (56): ${{\frac{\partial^{2}}{{\partial x_{k}}{\partial x_{j}}}{\varphi_{2}^{(n)}(x)}} = {\sum\limits_{m = 1}^{N_{s}}{\sum\limits_{i = 1}^{N_{d}}{\int{{a_{mik}(ɛ)}{a_{mij}(ɛ)}{w_{mi}^{(n)}(ɛ)}{ɛ}}}}}},$

In particular, $\begin{matrix} {{\frac{\partial^{2}}{\partial x_{k}^{2}}{\varphi_{2}^{(n)}(x)}} = {\sum\limits_{m = 1}^{N_{s}}{\sum\limits_{i = 1}^{N_{d}}{\int{\left\lbrack {a_{mik}(ɛ)} \right\rbrack^{2}{w_{mi}^{(n)}(ɛ)}{{ɛ}.}}}}}} & (57) \end{matrix}$

Combining (56) and (57) leads directly to a coordinate-descent algorithm with inner update: ${x_{k}^{({n + 1})} = \left\lbrack {x_{k}^{(n)} - \frac{{{\frac{\partial}{\partial x_{k}}{\Psi (x)}}}_{x = {({x_{1}^{({n + 1})},\ldots,x_{k - 1}^{({n + 1})},x_{k}^{(n)},\ldots,x_{K_{b}}^{(n)}})}}}{\frac{\partial^{2}}{\partial x_{k}^{2}}{\varphi_{2}^{(n)}( \cdot )}}} \right\rbrack_{+}},$

where [x]₊ enforces the non-negativity constraint.

E. Surrogate Based on Additive Convexity Trick

The surrogate function φ₂ ^((n)) is a non-separable quadratic function of x (a paraboloid). Non-separability is fine for coordinate descent, but inconvenient for simultaneous update algorithms. To derive a simple simultaneous update algorithm (fully parallelizable and suitable for ordered-subsets implementation), we find next a separable paraboloid surrogate by applying DePierro's additive convexity trick. The key is the following equality (an additive analog of the multiplicative form (48)): $\left\lbrack {{A_{m}(ɛ)}x} \right\rbrack_{i} = {{\sum\limits_{k = 1}^{K_{b}}{{a_{mik}(ɛ)}x_{k}}} = {\sum\limits_{k = 1}^{K_{b}}{{\pi_{mik}(ɛ)}{u_{mik}^{(n)}\left( {x_{k},ɛ} \right)}}}}$

where ${{u_{mik}^{(n)}\left( {x_{k},ɛ} \right)}\overset{\Delta}{=}{{\frac{a_{mik}(ɛ)}{\pi_{mik}(ɛ)}\left( {x_{k} - x_{k}^{(n)}} \right)} + \left\lbrack {{A_{m}(ɛ)}x^{(n)}} \right\rbrack_{i}}},$

provided the π_(mik)(ε)'s are non-negative and are zero only when a_(mik)(ε) is zero, and that ${\sum\limits_{k = 1}^{K_{b}}{\pi_{mik}(ɛ)}} = 1.$

Since q_(mi) ^((n))(l, ε) is a convex function, by the convexity inequality (cf. (50)) we have: $q_{mi}^{(n)}\left( {\left\lbrack {{A_{m}(ɛ)}x} \right\rbrack_{i},{ɛ = {{q_{mi}^{(n)}\left( {{\sum\limits_{k = 1}^{K_{b}}{{\pi_{mik}(ɛ)}{u_{mi}^{(n)}\left( {x_{k},ɛ} \right)}}},ɛ} \right)} \leq {\sum\limits_{k = 1}^{K_{b}}{{\pi_{mik}(ɛ)}{{q_{mi}^{(n)}\left( {{u_{mik}^{(n)}\left( {x_{k},ɛ} \right)},ɛ} \right)}.}}}}}} \right.$

This exchange “brings out the sum over k.” Combining (54) leads to the following final surrogate function: $\begin{matrix} {{\varphi_{3}^{(n)}(x)}\overset{\Delta}{=}{\sum\limits_{m = 1}^{N_{s}}{\sum\limits_{i = 1}^{N_{d}}{\int{{\frac{I_{mi}(ɛ)}{b_{mi}^{(n)}(ɛ)}\left\lbrack {\sum\limits_{k = 1}^{K_{b}}{{\pi_{mik}(ɛ)}{q_{mi}^{(n)}\left( {{u_{mik}^{(n)}\left( {x_{k},ɛ} \right)},ɛ} \right)}}} \right\rbrack}{{ɛ}.}}}}}} & (58) \end{matrix}$

The surrogate φ₃ ^((n)) is convenient because it is separable: ${\varphi_{3}^{(n)}(x)} = {\sum\limits_{k = 1}^{K_{b}}{\varphi_{3,k}^{(n)}\left( x_{k} \right)}}$

where $\begin{matrix} {{\varphi_{3,k}^{(n)}\quad \left( x_{k} \right)}\overset{\Delta}{=}{\sum\limits_{m = 1}^{N_{s}}\quad {\sum\limits_{i = 1}^{N_{d}}\quad {\int{\frac{I_{mi}\quad (ɛ)}{b_{mi}^{(n)}\quad (ɛ)}\quad \pi_{mik}\quad (ɛ)\quad q_{mi}^{(n)}\quad \left( {{u_{mik}^{(n)}\quad \left( {x_{k},ɛ} \right)},ɛ} \right){{ɛ}.}}}}}} & (59) \end{matrix}$

Combining separability with (45) leads to the following fully parallelizable algorithm: ${X_{k}^{({n + 1})} = {\arg \quad {\min\limits_{x_{k}}{\varphi_{3,k}^{(n)}\left( x_{k} \right)}}}},\quad {k = 1},\ldots \quad,{K_{b}.}$

As always, one must consider any constraints such as non-negativity. Since φ_(3,k) ^((n)) is a quadratic function, it is trivial to minimize as follows: $\begin{matrix} {x_{k}^{({n + 1})} = {\left\lbrack {x_{k}^{(n)} - \frac{{{\frac{\partial\quad}{\partial x_{k}}\quad \varphi_{3,k}^{(n)}\quad \left( x_{k} \right)}}_{x_{k} = x_{k}^{(n)}}}{{{\frac{{\partial\,^{2}}\quad}{\partial x_{k}^{2}}\quad \varphi_{3,k}^{(n)}\quad \left( x_{k} \right)}}_{x_{k} = x_{k}^{(n)}}}} \right\rbrack_{+}.}} & (60) \end{matrix}$

From (59) and (55), the partial derivatives needed are: $\begin{matrix} {{{\frac{\partial\quad}{\partial x_{k}}\quad \varphi_{3,k}^{(n)}\quad \left( x_{k} \right)}}_{x_{k} = x_{k}^{(n)}} = \quad {\sum\limits_{m = 1}^{N_{s}}\quad {\sum\limits_{i = 1}^{N_{d}}\quad {\int{\frac{I_{mi}\quad (ɛ)}{b_{mi}^{(n)}\quad (ɛ)}\quad a_{mik}\quad (ɛ)\quad {\overset{.}{q}}_{mi}^{(n)}\quad \left( {\left\lbrack {A_{m}\quad (ɛ)\quad x^{(n)}} \right\rbrack_{i},ɛ} \right){ɛ}}}}}} \\ {{= \quad {\frac{\partial\quad}{\partial x_{k}}\quad \Psi \quad (x)}}}_{x = x^{(n)}} \end{matrix}$

and ${{\frac{{\partial\,^{2}}\quad}{\partial x_{k}^{2}}\quad \varphi_{3,k}^{(n)}\quad \left( x_{k} \right)}}_{x_{k} = x_{k}^{(n)}} = {\sum\limits_{m = 1}^{N_{s}}\quad {\sum\limits_{i = 1}^{N_{d}}\quad {\int{\frac{\left( {a_{mik}\quad (ɛ)} \right)^{2}}{\pi_{mik}\quad (ɛ)}\quad w_{mi}^{(n)}\quad (ɛ){{ɛ}.}}}}}$

A useful choice for the π_(mik)(ε)'s is: ${{\pi_{mik}\quad (ɛ)} = \frac{a_{mik}\quad (ɛ)}{a_{mi}\quad (ɛ)}},$

where $\begin{matrix} {{a_{mik}\quad (ɛ)}\overset{\Delta}{=}{\sum\limits_{k = 1}^{K_{b}}\quad {a_{mik}\quad {(ɛ).}}}} & (61) \end{matrix}$

Substituting into (60) yields the following algorithm: $\begin{matrix} {{x_{k}^{({n + 1})} = \left\lbrack {x_{k}^{(n)} - \frac{{{\frac{\partial\quad}{\partial x_{k}}\quad \Psi \quad (x)}}_{x = x^{(n)}}}{d_{k}^{(n)}}} \right\rbrack},} & (62) \end{matrix}$

for k=1, . . . ,K_(b), wherein $\begin{matrix} {{d_{k}^{(n)}\overset{\Delta}{=}{\sum\limits_{m = 1}^{N_{s}}\quad {\sum\limits_{i = 1}^{N_{d}}\quad {\int{a_{mik}\quad (ɛ)\quad a_{mi}\quad (ɛ)\quad w_{mi}^{(n)}\quad (ɛ){ɛ}}}}}},} & (63) \end{matrix}$

and $\frac{\partial\quad}{\partial x_{k}}\quad \Psi \quad \left( x^{(n)}\quad \right)$

was defined in (42). This is a diagonally-scaled gradient-descent algorithm that is guaranteed to monotonically decrease the cost function each iteration.

Both the first derivatives (42) and the denominators {d_(k) ^((n))} require integrals over energy. These integrals can be computed by standard discrete approximations, to whatever accuracy the source spectrum and material properties are known. This is fundamentally different than making linear or polynomial approximations in the model at the outset.

To make a more practical algorithm, it would probably be reasonable to adopt the “precomputed fast denominator” idea and to apply the ordered subsets principles. Applying these principles alone would cause a loss of guaranteed monotonicity, but have proven to be useful in practice. By applying suitable relaxation, one could even guarantee convergence if the cost function were convex.

FIG. 1a shows a flow chart of the method or algorithm of the present invention. Initially, two or more raw sinograms are obtained such as from a detector of FIG. 1b. Calibration data for the source spectra are also provided.

Component line integrals and error covariances are estimated utilizing the raw sinograms and the calibration data.

FBP reconstruction of component images is then performed utilizing the component line integrals. Then, optionally, component images are created utilizing iterative WLS reconstruction utilizing the component line integrals, component error covariances and the component images.

Then, either the component images obtained by FBP reconstruction or the component images constructed by iterative WLS are chosen to reproject in the next step which reprojects the chosen components to estimate line integrals.

Then, measurement means and gradients are computed.

The measurement means and gradients are utilized by a backprojection process to compute the cost function gradient.

The component images are updated using correction factors. Component constraints are applied such as non-negativity.

The number of iterations is checked against a predetermined number or other criteria related to the component images and data are considered, and if the iterative part of the method is complete, then, the final component images are displayed and/or analyzed. If not done, then the iterative part of the method is re-entered at the choosing step.

Alternatively, after reconstruction of the component images is performed by the iterative WLS reconstruction step, the component images may be displayed and/or analyzed as indicated by the dashed line in FIG. 1a.

FIG. 1b is a simplified schematic block diagram showing how an image reconstructor of the present invention interacts with the subsystem of a simple CT subsystem to perform the method of FIG. 1a.

F. ML Dual-Energ Case

Perhaps the most common application of this algorithm will be to dual-energy X-ray CT reconstruction with a two-component separable basis of the following form (cf. (3)): ${\chi_{k}\left( {\overset{\rightarrow}{x},ɛ} \right)} = \left\{ \begin{matrix} {{{\beta_{1}(ɛ)}{b_{k}\left( \overset{\rightarrow}{x} \right)}},} & {{k = 1},\ldots \quad,N_{p}} \\ {{{\beta_{2}(ɛ)}{b_{k - N_{p}}\left( \overset{\rightarrow}{x} \right)}},} & {{k = {N_{p} + 1}},\ldots \quad,{2N_{p}},} \end{matrix} \right.$

with K=2N_(p), for some spatial basis {b_(j)({right arrow over (x)}}_(j=1) ^(N) ^(_(p)) such as pixels, where β₁(ε) and β₂(ε) denote the spectral properties of the two basis materials. For simplicity, also assume that the measured ray configuration is the same for both incident spectra, i.e., L_(mi) =L_(i) for m=1,2. In this case, ${a_{mik}(ɛ)} = \left\{ \begin{matrix} {{{\beta_{1}(ɛ)}a_{ik}},} & {{k = 1},\ldots \quad,N_{p}} \\ {{{\beta_{2}(ɛ)}a_{i,{k - N_{p}}}},} & {{k = {N_{p} + 1}},\ldots \quad,{2N_{p}},} \end{matrix} \right.$

where a_(ij) was defined in (9). In other words,

A _(m)(ε)x=β₁(ε)Ax ₁+β₂(ε)Ax ₂,  (64)

where x=(x₁,x₂) and where x₁ and x₂ represent the basis coefficients (voxel values) for the two component images. This expression is independent of m. Furthermore, from (61)

 a _(mi)(ε)=[β₁(ε)+β₂(ε)]a _(i),

where $a_{i}\overset{\Delta}{=}{\sum\limits_{j = 1}^{N_{p}}{a_{ij}.}}$

With these simplifications, the measurement mean model (24) becomes

{overscore (y)} _(mi)(x)=I _(mi) e ^(−ƒ) ^(_(im)) ^(([Ax) ^(₁) ^(]) ^(_(i)) ^(,[Ax) ^(₂) ^(]) ^(_(i)) ⁾ +r _(mi)  (65)

where ƒ_(im) and I_(mi) were defined above.

Given incident source spectra I_(mi)(ε), one can precompute and tabulate ƒ_(im) for m=1,2. Since ƒ_(im) is nearly linear, a modest table combined with bilinear interpolation is effective. (For monoenergetic sources, ƒ_(im) would be exactly linear.) Alternatively, a polynomial approximation togf can be used.

The algorithm (62) also requires the partial derivatives of the cost function. Using (64), the partial derivatives (42) simplify to $\begin{matrix} {{{\frac{\partial}{\partial x_{l_{j}}}{\Psi (x)}} = {\sum\limits_{i = 1}^{N_{d}}{a_{ij}\left\lbrack {\sum\limits_{m = 1}^{N_{i}}{{{\overset{.}{\psi}}_{mi}\left( {{\overset{\_}{y}}_{mi}(x)} \right)}{{\overset{\sim}{G}}_{iml}\left( {\left\lbrack {Ax}_{1} \right\rbrack_{i},\left\lbrack {Ax}_{2} \right\rbrack_{i}} \right)}\left( {{{\overset{\_}{y}}_{mi}(x)} - r_{mi}} \right)}} \right\rbrack}}},} & (66) \end{matrix}$

for j=1, . . . ,N_(p) and l=1,2, where x_(ij) denotes the jth component of x_(l), and where {tilde over (G)}_(iml)(s₁,s₂ ) denotes the derivative of ƒ_(im) (s₁,s₂) with respect to s₁. Thus, the algorithm (62) becomes: $\begin{matrix} {{x_{lj}^{({n + 1})} = \left\lbrack {x_{lj}^{(n)} - \frac{\frac{\partial}{\partial x_{lj}}{\Psi \left( x^{(n)} \right)}}{d_{lj}^{(n)}}} \right\rbrack_{+}},} & (67) \end{matrix}$

for j=1, . . . ,N_(p), l=1,2, where $\frac{\partial}{\partial x_{lj}}\Psi$

defined in (66), and from (63) $\begin{matrix} {d_{lj}^{(n)} = {\sum\limits_{i = 1}^{N_{d}}{a_{ij}a_{i}{\sum\limits_{m = 1}^{N_{s}}{\int{{{\overset{\sim}{\beta}}_{l}(ɛ)}{w_{mi}^{(n)}(ɛ)}{{ɛ}.}}}}}}} & (68) \end{matrix}$

where ${{\overset{\sim}{\beta}}_{l}(ɛ)}\overset{\Delta}{=}{{\beta_{l}(ɛ)}{\left( {{\beta_{1}(ɛ)} + {\beta_{2}(ɛ)}} \right).}}$

Since the integral in (68) depends on the current iterate x^((n)), it cannot be precomputed and tabulated. One integral per ray is needed, followed by one backprojection per material component. Thus, the guarantee of intrinsic monotonicity requires roughly a 50% increase in computation per iteration over an algorithm in which the d_(ij)'s are precomputed. In similar previous studies, it has been observed that one can use precomputed approximations to the d_(lj) ^((n))'s yet preserve monotonicity almost always. This observation motivates the following hybrid approach: we first use precomputed d_(lj) ^((n))'s each iteration, and then check the cost fuinction to verify that it decreased. If not, then the update is recomputed using the d_(lj) ^((n))'s that intrinsically guarantee a monotone decrease. Only very rarely is this recomputation required, so the resulting “hybrid” method is fast and practical yet also guaranteed to be monotone. When speed is considered more important than monotonicity, one can also easily apply ordered subsets to (67) by downsampling the sums over “i” in the derivative expressions.

G. Precomputed Curvatures

To reduce computation per iteration (at the price of losing the monotonicity guarantee), precomputed values may be used for the denominators d_(ij). Observe that c_(mi) is the curvature of q_(mi) in (33), which is a parabola surrogate for g_(mi) in (29). A reasonable approximation for c_(mi) ^((n))(ε) is to use the curvature g_(mi) ^((n)) at its minimizer, i.e.,

c _(mi) ^((n))(ε)≈{umlaut over (g)} _(mi)(l,x ^((n)),ε)|_(l=arg min) _(l) _(g) _(mi) _(^((n))) _((l,ε)),

where ${{\overset{¨}{g}}_{mi}\left( {l,x,ɛ} \right)}\overset{\Delta}{=}{\frac{\partial^{2}}{\partial l^{2}}{{g_{mi}\left( {l,x,ɛ} \right)}.}}$

From (29), for P near the minimizer of g_(mi):

{umlaut over (g)} _(mi)(l,x,ε)≈[−b _(mi)(x,ε)e ^(−l)]₂ {umlaut over (h)} _(mi)(b _(mi)(x,ε)e ^(−l) +r _(mi)(x,ε),x,ε).

The minimizer of ψ_(mi)(y) and h_(mi)(y,{overscore (y)}_(mi)(x^((n)))) should be approximately where y=y_(mi). (This holds exactly for the Poisson noise model.) So the minimizer of g_(mi) ^((n)) should be approximately where b_(mi)(x,ε)e^(−l)+r_(mi)(x,ε)=y_(mi). If ψ_(mi) is nearly convex, then h_(mi) and ψ_(mi) should have approximately the same curvatures. Combining these heuristics with the assumption that r_(mi) is neglible yields the approximation

c _(mi) ^((n))(ε)≈y _(mi) ²{umlaut over (ψ)}_(mi)(y _(mi))

If ψ_(mi) corresponds a statistical model that is approximately Poisson, then {umlaut over (ψ)}_(mi)(y_(mi))≈1/y_(mi). Thus, we use the approximation c_(mi) ^((n))(ε)≈y_(mi) hereafter, which we substitute into (68). To further simplify, we replace the {umlaut over (β)}_(l)(ε) values in (68) by their values at the effective energy of the mth incident spectrum: $\begin{matrix} {{{\overset{\_}{ɛ}}_{m}\overset{\Delta}{=}\frac{\int{ɛ\quad {I_{mi}(ɛ)}{ɛ}}}{\int{{I_{mi}(ɛ)}{ɛ}}}},\quad {m = 1},2,} & (69) \end{matrix}$

yielding the approximation ${{{\int{{{\overset{\sim}{\beta}}_{l}(ɛ)}\frac{I_{mi}(ɛ)}{b_{mi}^{(n)}(ɛ)}{ɛ}}} \approx {{{\overset{\sim}{\beta}}_{l}\left( {\overset{\_}{ɛ}}_{m} \right)}{\int{\frac{I_{mi}(ɛ)}{b_{mi}^{(n)}(ɛ)}{ɛ}}}}} = {{\overset{\sim}{\beta}}_{l}\left( {\overset{\_}{ɛ}}_{m} \right)}},$

using (30). Thus, the final precomputed approximation to d_(lj) ^((n)) is $\begin{matrix} {d_{lj}\overset{\Delta}{=}{\sum\limits_{i = 1}^{N_{d}}{a_{ij}a_{i}{\sum\limits_{m = 1}^{N_{s}}{y_{mi}{{{\overset{\sim}{\beta}}_{l}\left( {\overset{\_}{ɛ}}_{m} \right)}.}}}}}} & (70) \end{matrix}$

Precomputing (70) requires one object-independent forward projection (to compute the a_(i)'s), and two measurement-dependent backprojections (one for each l).

H. Examining ƒ_(im) Via a Linearizing Transformation

Since ${{\frac{\partial\quad}{\partial s_{l}}\quad f_{im}\quad \left( {0,0} \right)} = {\overset{\_}{\beta}}_{m\quad l}},$

we define $\begin{matrix} {\begin{bmatrix} F_{1i}^{*} \\ F_{2i}^{*} \end{bmatrix} = {{\begin{bmatrix} \overset{\_}{\beta_{11}} & \overset{\_}{\beta_{12}} \\ \overset{\_}{\beta_{21}} & \overset{\_}{\beta_{22}} \end{bmatrix}^{- 1}\begin{bmatrix} F_{1i} \\ F_{2i} \end{bmatrix}}.}} & (71) \end{matrix}$

The effect of this transformation is that ${{\frac{\partial\quad}{\partial s_{l}}\quad F_{m\quad i}^{*}\quad \left( {0,0} \right)} = 1_{l = m}},$

where 1 denotes the indicator function.

Results

A. Computer Simulation

To evaluate the feasibility of the proposed approach, a computer simulation of dual-energy polyenergetic CT scans was performed. Both the conventional dual-energy FBP reconstruction method, the proposed polyenergetic WLS method and the proposed ML reconstruction method were applied to those simulated measurements.

FIGS. 2a-2 b show the source spectra for the two simulated source voltages (80 kVp and 140 kVp). The dashed lines in the Figures show the effective energies of these spectra, as defined by Equation (69).

The separable parameterization of Equation (21) was used. FIG. 3 shows the mass attenuation coefficients β_(l)(ε) of bone mineral and soft tissue from NIST web pages:

http://physics.nist.gov/PhysRefData/XrayMassCoef/ComTab/tissue.html.

FIGS. 4a and 4 b show the function ƒ_(im) defined in (11). These functions look to be nearly linear, but they are not quite linear due to beam hardening.

FIG. 5 shows the linearized form (71) for uniform samples in s₁ and S₂. Near zero the values are nearly uniform (due to the Jacobian) but away from zero there are nonlinearities due to beam hardening.

FIGS. 6a-6 d show the object x^(true) used in the computer simulation. The units of x are density (g/cm³) and were assigned to 1.0 for soft tissue, 0.2 for lungs, 1.9 for spine, and 2.0 for ribs. The lungs and soft tissue had the “soft tissue” characteristics shown in FIG. 3, and the spine and ribs had the “bone” characteristics of FIG. 3. The images were 128×104 and the pixel size was 0.32 cm.

We simulated dual-energy measurements {overscore (y)}_(mi) using (24) and the spectra shown in FIGS. 2a and 2 b. The sinograms 140 radial samples (parallel beam geometry with 0.32 cm sample spacing) and 128 angles over 180°. To the noiseless sinograms {overscore (y)}_(mi) we added pseudo-random Poisson distributed noise corresponding to a total of 47M recorded photons (chosen arbitrarily to represent a moderately low SNR case where FBP yields noticeable streaks). FIGS. 7a and 7 b show the simulated sinograms Y_(mi).

We first show the results of conventional dual-energy tomographic reconstruction. FIGS. 8a and 8 b show the estimates ƒ^(im) described in (14) as computed from the noisy dual-energy sinograms, using a small amount of radial smoothing to help control noise. FIGS. 9a-9 f show the estimates ŝ_(i) described in (17) as computed from the {circumflex over (ƒ)}_(im)'s . FIGS. 9a-9 f also show the error sinograms. There is substantial error because the inversion in (17) is a noise-amplifying step due to the similarities between the two spectra in FIGS. 2a and 2 band the similarities of the mass attenuation coefficients in FIG. 3.

FIGS. 10a-10 f show ramp-filtered FBP reconstructions of the soft-tissue and bone components from the Ŝ_(i)'S shown in FIGS. 10a-10 f. FIGS. 10a-10 f also show the error images. Substantial noise propagates from the Ŝ's into the FBP images.

FIGS. 11a-11 f show PL dual-energy reconstruction of soft tissue and bone components. The density units are i/cm.

FIGS. 10a-10 f and 11 a-11 f illustrate the current tradeoff that faces X-ray CT imaging. Ordinary single-energy methods are inaccurate, whereas FBP-based dual-energy methods are unacceptably noisy.

Although the algorithm description in the preceding sections did not include regularization, it is straightforward to include regularization in this algorithm. A penalized-likelihood extension of the ML dual-energy method was implemented. In our implementation, we used pairwise pixel differences with a Huber potential function and a second-order neighborhood for the regularizing penalty function. The strength parameter was modulated to help provide resolution unifor_(mi)ty (away from object boundaries). Many other regularization methods could be used instead.

FIG. 12 shows the cost function changes Ψ(x^((n))−Ψ(x⁽⁰⁾) versus iteration for a 1 subset version of the algorithm with precomputed curvatures as described in (68). The 4 subset version “converges” nearly 4 times faster than the 1 subset version. This acceleration factor is typical for ordered-subset methods.

Conclusion

A statistical method for reconstructing dual-energy X-ray CT images is described herein. The method is applicable to related tomographic imaging problems having energy diversity. The method can accommodate a very wide variety of statistical models and is likely to be sufficiently general to cover all useful choices since the mathematical assumptions on X are quite flexible.

Undoubtably, many simplifications are possible for special cases of the above general framework. For simplicity, we have used an approximate physical model that ignores the nonlinearity caused by the exponential edge-gradient effect. Using optimization transfer methods similar to those used here, one could extend the algorithm derivation to account for this effect. Other blurring effects like detector after-glow, finite X-ray focal spot size, flying focal spot, detector response, could also be included.

Another extension would be to incorporate Compton scatter into the measurement model since a basis material formulation should facilitate model-based scatter approaches such as those used successfully in PET. The importance of scatter is well known.

Using a dual-energy approach eliminates the need for beam-hardening corrections, and the use of statistical methods also reduces metal artifacts.

As derived here, our final surrogate function is completely separable, both spatially and spectrally (in terms of the component density coefficients x_(lj)). It may be preferable to modify the derivation to only invoke spatial separation, but leave the spectral components coupled since they are jointly constrained so should be considered together.

The method is potentially applicable to all known dual-energy CT problems, including attenuation correction for PET-CT systems.

The invention relates to a method for statistical tomographic reconstruction from dual-energy X-ray CT scans or other sets of two or more transmission scans with differing energy spectra. This method has potential application in a variety of X-ray (and polyenergetic gamma ray) imaging applications, including medical X-ray CT, non-destructive evaluation in manufacturing, security purposes like baggage inspection, and accurate attenuation correction for PET scans from X-ray CT images.

Existing dual-energy methods are largely non-statistical and lead to unacceptably noisy images that have impeded their commercial adoption. Our statistical approach controls the noise by using more sophisticated modern iterative algorithms. This method extends the previously developed algorithm described in U.S. Pat No. 6,507,633 for reconstruction from single X-ray CT scans to the case of dual-energy scans.

While embodiments of the invention have been illustrated and described, it is not intended that these embodiments illustrate and describe all possible forms of the invention. Rather, the words used in the specification are words of description rather than limitation, and it is understood that various changes may be made without departing from the spirit and scope of the invention. 

What is claimed is:
 1. A method for statistically reconstructing images from a plurality of transmission measurements having energy diversity, the method comprising: providing a plurality of transmission measurements having energy diversity; and processing the measurements with an algorithm based on a statistical model which accounts for the energy diversity to obtain at least one final component image which has reduced noise.
 2. The method as claimed in claim 1, further comprising providing a cost function based on the statistical model wherein the cost function is minimized during the step of processing.
 3. The method as claimed in claim 2, wherein the cost function has a gradient which is calculated during the step of processing.
 4. The method as claimed in claim 3, wherein the gradient is calculated by backprojecting.
 5. The method as claimed in claim 1, further comprising analyzing the at least one final component image.
 6. The method as claimed in claim 1, further comprising calibrating spectra of the measurements to obtain calibration data wherein the step of processing utilizes the calibration data.
 7. The method as claimed in claim 1, further comprising displaying the at least one final component image.
 8. The method as claimed in claim 3, wherein the gradient is calculated by approximately using a subset of the measurements, such as an ordered subset of projection views, to accelerate the algorithm.
 9. The method as claimed in claim 2, wherein the cost function has a regularizing penalty term.
 10. The method as claimed in claim 1, wherein the measurements are dual-energy X-ray CT scans.
 11. The method as claimed in claim 1, wherein the measurements are transmission scans with differing energy spectra, such as X-ray sources with different tube voltages or different filtrations, or gamma-ray sources with multiple energies.
 12. The method as claimed in claim 2, wherein the cost function includes a log-likelihood term.
 13. The method as claimed in claim 2, wherein the cost function consists solely of a log-likelihood function, which is called maximum likelihood reconstruction, or wherein the cost function consists of both a log-likelihood function and a regularizing penalty function, which is called penalized-likelihood or maximum a posteriori image reconstruction.
 14. The method as claimed in claim 1, further comprising preprocessing the measurements prior to the step of processing to obtain preprocessed measurements and wherein the preprocessed measurements are processed in the step of processing to obtain the at least one component image.
 15. The method as claimed in claim 12, wherein the log likelihood term is a function that depends on a model for an ensemble mean of the transmission measurements, and the model incorporates characteristics of an energy spectrum.
 16. The method as claimed in claim 12, wherein the log-likelihood term is a function of the transmission measurements, prior to any pre-processing such as taking a logarithm of the measurements.
 17. The method as claimed in claim 3, wherein the gradient of the cost function is calculated using a parametric approximation, such as polynomials, tables, or piecewise polynomials.
 18. The method as claimed in claim 9, wherein the regularizing penalty term is based on quadratic functions of linear combinations of voxel values or nonquadratic (edge-preserving) functions of such combinations.
 19. The method as claimed in claim 2, wherein parameter constraints such as non-negativity of voxel values are enforced during or after minimization of the cost function.
 20. The method as claimed in claim 14, wherein the processing step is based on the preprocessed measurements and uses a cost function based on a statistical model for variability of the preprocessed measurements.
 21. An image reconstructor apparatus for statistically reconstructing images from a plurality of transmission measurements having energy diversity, the apparatus comprising; means for providing a plurality of transmission measurements having energy diversity; and means for processing the measurements with an algorithm based on a statistical model which accounts for the energy diversity to obtain at least one final component image which has reduced noise.
 22. The apparatus as claimed in claim 21, further comprising means for providing a cost function based on the statistical model wherein the cost function is minimized by the means for processing.
 23. The apparatus as claimed in claim 22, further comprising means for calculating a gradient of the cost function.
 24. The apparatus as claimed in claim 23, wherein the means for calculating calculates the gradient by backprojecting.
 25. The apparatus as claimed in claim 21, further comprising means for analyzing the at least one final component image.
 26. The apparatus as claimed in claim 21, further comprising means for calibrating spectra of the measurements to obtain calibration data wherein the means for processing utilizes the calibration data.
 27. The apparatus as claimed in claim 21, further comprising a display for displaying the at least one final component image.
 28. The apparatus as claimed in claim 23, wherein the means for calculating calculates the gradient approximately using a subset of the measurements, such as an ordered subset of projection views, to accelerate the algorithm.
 29. The apparatus as claimed in claim 22, wherein the cost function has a regularizing penalty term.
 30. The apparatus as claimed in claim 21, wherein the measurements are transmission scans with differing energy spectra, such as X-ray sources with different tube voltages or different filtrations, or gamma-ray sources with multiple energies.
 31. The apparatus as claimed in claim 21, wherein the cost function includes a log-likelihood term.
 32. The apparatus as claimed in claim 22, wherein the cost function consists solely of a log-likelihood function, which is called maximum likelihood reconstruction, or wherein the cost function consists of both a log-likelihood function and a regularizing penalty function, which is called penalized-likelihood or maximum a posteriori image reconstruction.
 33. The apparatus as claimed in claim 22, wherein the cost function includes a maximum likelihood or penalized likelihood algorithm.
 34. The apparatus as claimed in claim 21, further comprising means for preprocessing the measurements to obtain preprocessed measurements wherein the preprocessed measurements are processed by the means for processing to obtain the at least one component image.
 35. The apparatus as claimed in claim 31, wherein the log likelihood term is a function that depends on a model for an ensemble mean of the transmission measurements, and the model incorporates characteristics of an energy spectrum.
 36. The apparatus as claimed in claim 31, wherein the log-likelihood term is a function of the transmission measurements, prior to any preprocessing such as taking a logarithm of the measurements.
 37. The apparatus as claimed in claim 23, wherein the gradient of the cost function is calculated using a parametric approximation, such as polynomials, tables, or piecewise polynomials.
 38. The apparatus as claimed in claim 29, wherein the regularizing penalty term is based on quadratic functions of linear combinations of voxel values or nonquadratic (edge-preserving) functions of such combinations.
 39. The apparatus as claimed in claim 22, wherein parameter constraints such as non-negativity of voxel values are enforced during or after minimization of the cost function.
 40. The apparatus as claimed in claim 34, wherein the means for processing processes the preprocessed measurements and uses a cost function based on a statistical model for variability of the preprocessed measurements. 